1 Intro

The current work was carried out to study the data and interpret the result of the study that was conducted by Dalia Research in April 2016

2 Who are our respondents?

2.1 Distribusion by age and gender

The distribution of respondents by age among men and women is almost the same.

# DISTRIBUSION BY AGE + GENDER
ggplot(data = income_data, aes(x = age)) + 
  ggtitle('The distribution of respondents by age') +
  geom_histogram(aes(y = ..density..)
                 ,col = 'black'
                 ,fill = 'white') +
  geom_density(alpha = 0.2, fill='#FF6666') + 
  facet_grid(gender ~.)

Noticeably a slight shift towards men.

# DISTRIBUSION BY GENDER (SHARE)
ggplotly(
ggplot(data = income_data, aes(x = age, fill = gender)) + 
  ggtitle('Share distribution by male and female') +
  labs(x = 'Age', y = 'Share of gender') +
  geom_histogram(position = 'fill', alpha=0.7, binwidth = 1) + 
  scale_x_continuous(breaks=seq(10 , max(income_data[,age]), 5)) + 
  geom_hline(aes(yintercept = 0.5), colour="white")
)

## What about education?

I changed the name of the attributes in the field dem_education_level, to rank these parameters.

income_data[dem_education_level == 'no', dem_education_level:="no"]
income_data[dem_education_level == 'low', dem_education_level:="3. low"]
income_data[dem_education_level == 'medium', dem_education_level:="2. medium"]
income_data[dem_education_level == 'high', dem_education_level:="1. high"]

Now let’s see how our correspondents are distributed by age and level of education

ggplotly(
  ggplot(data = income_data, aes(x = age, fill = dem_education_level)) + 
    ggtitle('Distribusion by age and level of education') + 
    labs(x = 'Number fo respondents', y = 'Share') + 
    geom_histogram(bins = 50
                 ,alpha = 0.7)
)

Let’s look at the share of distribution. The majority of respondents have medium and higher education.

ggplotly(
  ggplot(data = income_data, aes(x = age, fill = dem_education_level)) + 
    ggtitle('Share of education level by age group') + 
    labs(x = 'Age Group', y = 'Share') + 
    geom_histogram(bins = 50
                   ,position = "fill"
                   ,alpha = 0.7)
)

Let’s see how the share of the education level is distributed according to the age category of the respondents.

round(prop.table(table(income_data$dem_education_level, income_data$age_group),margin = 2),2)
##            
##             14_25 26_39 40_65
##   1. high    0.25  0.46  0.35
##   2. medium  0.41  0.36  0.42
##   3. low     0.29  0.15  0.20
##   no         0.05  0.03  0.03
prob_t <-  data.table(round(prop.table(table(income_data$dem_education_level, income_data$age_group),margin = 2),2))
names(prob_t) <- c('dem_education_level', 'age_group', 'probability')

Now let’s display the shares on the bar-plot

ggplotly(
ggplot(data = prob_t, aes(x = age_group, y = probability, fill = dem_education_level)) + 
  ggtitle('Share of education level by age group') + 
  labs(x = 'Age Group', y = 'Share') + 
  geom_bar(stat = 'identity'
           ,alpha=0.7
           ,col = 'black')
)

To be continued…